Data Driven Grammatical Error Detection in Transcripts of Children's Speech

نویسندگان

  • Eric Morley
  • Anna Eva Hallin
  • Brian Roark
چکیده

We investigate grammatical error detection in spoken language, and present a data-driven method to train a dependency parser to automatically identify and label grammatical errors. This method is agnostic to the label set used, and the only manual annotations needed for training are grammatical error labels. We find that the proposed system is robust to disfluencies, so that a separate stage to elide disfluencies is not required. The proposed system outperforms two baseline systems on two different corpora that use different sets of error tags. It is able to identify utterances with grammatical errors with an F1-score as high as 0.623, as compared to a baseline F1 of 0.350 on the same data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effect of sampling on estimates of lexical specificity and error rates.

Studies based on naturalistic data are a core tool in the field of language acquisition research and have provided thorough descriptions of children's speech. However, these descriptions are inevitably confounded by differences in the relative frequency with which children use words and language structures. The purpose of the present work was to investigate the impact of sampling constraints on...

متن کامل

Grammatical Error Detection for Corrective Feedback Provision in Oral Conversations

The demand for computer-assisted language learning systems that can provide corrective feedback on language learners’ speaking has increased. However, it is not a trivial task to detect grammatical errors in oral conversations because of the unavoidable errors of automatic speech recognition systems. To provide corrective feedback, a novel method to detect grammatical errors in speaking perform...

متن کامل

Integrating imperfect transcripts into speech recognition systems for building high-quality corpora

The training of state-of-the-art automatic speech recognition (ASR) systems requires huge relevant training corpora. The cost of such databases is high and remains a major limitation for the development of speech-enabled applications in particular contexts (e.g. low-density languages, or specialized domains). On the other hand, a large amount of data can be found in news prompts, movie subtitle...

متن کامل

Grammatical error detection from English utterances spoken by Japanese

This paper describes methods to recognize English utterances by Japanese learners as accurately as possible and detects grammatical errors from the transcription of the utterances. This method is a building block for the voice-interactive Computer-Assisted Language Learning (CALL) system that enables a learner to make conversation practice with a computer. A difficult point for development of s...

متن کامل

A Model of Early Syntactic Development

AMBER is a model of first language acquisition that improves its performance through a process of error recovery. The model is implemented as an adaptive production system that introduces new condition-action rules on the basis of experience. AMBER starts with the ability to say only one word at a time, but adds rules for ordering goals and producing grammatical morphemes, based on comparisons ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014